Data Placement Analysis for A Distributed Heterogeneous High Performance Computing Environment
نویسنده
چکیده
Heterogeneous Associative Computing (HAsC) is a new distributed heterogeneous computing paradigm that is a combination of associative computing, superconcurrency, and distributed heterogeneous computing. Its task mapping and resulting execution decision is based on data locality along with a variety of system, data, and algorithmic factors. In this paper, we present Data Placement Analysis (DPA) that is an effective and efficient method of placing the right data set on suitable machines for execution. Thus, when incorporated into HAsC task mapping, DPA will strongly influence the machine selection / task mapping process. The effectiveness of DPA is then shown with experimental results. 1.0 Introduction Distributed heterogeneous high performance computing , also called metacomputing (Smarr and Callett 1992), is where a single program's tasks are distributed among a network of heterogeneous machines. Although this network of machines is viewed by the user's program as a single unified virtual machine (Scott and Potter 1994) the underlying heterogeneous computing paradigm must ultimately perform the job of task placement at the physical machine level. Current techniques used for task placement in heterogeneous paradigms fall well short of informed task placement. Existing task placement techniques tend to be more concerned with processor (machine) utilization or load balancing than with the overall program throughput. Granted these load balancing methods are rather effective when employed for a group of distributed homogeneous machines. This is because in a homogeneous environment, as processor utilization increases, tasks are evenly distributed (load balanced) among the available processors and subsequently overall program throughput is increased as a direct side-effect. However, in a heterogeneous environment one should not consider high processor utilization or load balancing to be a good measure of the effectiveness of their program. This processor utilization / load balancing technique in a heterogeneous environment may result in codes being executed on a machine that does not have the capability to efficiently and effectively perform them. Simply, if ones goal is to ultimately maximize machine utilization in a heterogeneous environment, they need only send to each machine, those tasks least suited for that particular machine's architecture type. Even without experimentation this conclusion may be drawn as an extension of Amdal's law. As an example, consider performing very-large data-parallel tasks, such as image processing on a sequential machine, while performing simple sequential scalar arithmetic on a SIMD machine. The sequential machine will be churning through each data point of the image while the SIMD …
منابع مشابه
Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملData Replication-Based Scheduling in Cloud Computing Environment
Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...
متن کاملCommunication-Aware Traffic Stream Optimization for Virtual Machine Placement in Cloud Datacenters with VL2 Topology
By pervasiveness of cloud computing, a colossal amount of applications from gigantic organizations increasingly tend to rely on cloud services. These demands caused a great number of applications in form of couple of virtual machines (VMs) requests to be executed on data centers’ servers. Some of applications are as big as not possible to be processed upon a single VM. Also, there exists severa...
متن کاملParallel Rule Mining with Dynamic Data Distribution under Heterogeneous Cluster Environment
Big data mining methods supports knowledge discovery on high scalable, high volume and high velocity data elements. The cloud computing environment provides computational and storage resources for the big data mining process. Hadoop is a widely used parallel and distributed computing platform for big data analysis and manages the homogeneous and heterogeneous computing models. The MapReduce fra...
متن کاملAttributed Consistent Hashing for Heterogeneous Storage System
Cloud-scale storage system is an important building block of the cloud infrastructure. It demands the flexibility to distribute data and provide high I/O performance. Consistent hashing algorithm is widely used in large-scale parallel/distributed storage systems for the decentralized design, scalability and adaptability. It can evenly distribute data among nodes but lack efficiency in a heterog...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995